Validate synthetic ID format on inbound header and cookie values#508
Validate synthetic ID format on inbound header and cookie values#508
Conversation
Inbound synthetic IDs from the x-synthetic-id header and synthetic_id cookie were accepted without validation. An attacker could inject arbitrary strings — including very long values, special characters, or newlines — which were then set as response headers, cookies, and forwarded to third-party APIs. Adds a private is_valid_synthetic_id() validator enforcing the canonical format (64 lowercase hex chars + '.' + 6 alphanumeric chars). The length check is O(1) and runs first to bound all downstream work. Invalid values are silently discarded and a fresh ID is generated in their place; the raw value is never written to logs. Also adds a debug_assert! in generate_synthetic_id() to catch any future regression in the generator, moves VALID_SYNTHETIC_ID to test_support so it is shared across all test modules, and demotes synthetic ID values from INFO to DEBUG in log output to avoid recording pseudonymous identifiers in production log pipelines. Closes #412
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Clean, well-executed security hardening. Validation logic is correct and thoroughly tested. Approving with one suggestion inline.
Strengths:
- O(1) length check gates all downstream work — good defense against oversized input on an edge server
- Lowercase hex enforcement prevents intermediary case-normalization from creating fake-valid IDs
debug_assert!in the generator ensures validator and generator never drift apart- Logging hygiene — invalid values logged with
len={}only, never the raw attacker-controlled string - Excellent test coverage with 8+ new/updated test cases
- Shared
VALID_SYNTHETIC_IDconstant eliminates test fragility across 3 modules - CHANGELOG and docs properly updated
Note: The lol_html bump (e295f3a) in Cargo.toml/Cargo.lock appears to be a merge artifact from main — unrelated to the synthetic ID validation. Not a problem, just flagging it.
aram356
left a comment
There was a problem hiding this comment.
Summary
Well-executed security hardening PR that validates synthetic ID format on inbound header and cookie values. The validation logic is correct and test coverage is thorough. One gap in the revocation path needs attention.
Blocking
🔧 wrench
- Revocation path bypasses validation:
existing_ssc_cookieinpublisher.rs:379-387is read from the raw cookie jar beforeget_or_generate_synthetic_idruns validation. This raw, unvalidated value is then: (1) logged atlog::info!level (line 381) — potentially logging attacker-controlled content, and (2) passed todelete_consent_from_kvas a KV store key (line 387) — allowing a crafted cookie value to target arbitrary KV keys for deletion. This is the same class of injection this PR aims to fix. Suggested fix: makeis_valid_synthetic_idpub(crate)and validateexisting_ssc_cookiebefore use, or restructure so the already-validated synthetic ID is reused for revocation. At minimum, track as a follow-up issue.
Non-blocking
🏕 camp site
- Clarify old vs new validation layers (
cookies.rs:38): The oldsynthetic_id_has_only_allowed_chars(permits[a-zA-Z0-9._-]) remains for outbound cookie sanitization, while the newis_valid_synthetic_idinsynthetic.rsis strictly tighter for inbound validation. A brief comment noting this layered relationship would help future readers.
⛏ nitpick
- PR body references stale
crates/common/src/paths: actual paths arecrates/trusted-server-core/src/... - PR checklist says "tracing macros": project uses
logcrate, nottracing
CI Status
- fmt: PASS
- clippy: PASS
- rust tests: PASS
- js tests: PASS
aram356
left a comment
There was a problem hiding this comment.
Summary
Clean, well-designed security fix that validates inbound synthetic IDs from the x-synthetic-id header and synthetic_id cookie. The is_valid_synthetic_id() validator is correct, defense-in-depth layering across synthetic.rs, cookies.rs, and publisher.rs is sound, and test coverage is thorough.
Non-blocking
📌 out of scope
- Synthetic ID still logged at INFO in proxy.rs redirect handler: The redirect handler at
proxy.rs:717still logs the validated synthetic ID value atlog::info!. Low risk since the value is already validated, but inconsistent with the PR's stated goal of demoting ID values to DEBUG. Worth a follow-up.
🤔 thinking
split_once('.')vs oldsplit('.')is subtly correct: The new approach naturally handles multi-dot values because the total length constraint (71 = 64 + 1 + 6) means extra dots fail the alphanumeric suffix check. Thedot_in_suffixtest covers this. Sound design choice.
👍 praise
- Excellent defense-in-depth in publisher.rs revocation path (
publisher.rs:382-399): Always expires the cookie regardless of validity, only passes validated IDs todelete_consent_from_kv, logs only the length of invalid values. Prevents malicious cookie values being used as KV keys. - Synthetic ID values removed from log output (
synthetic.rs): Old code logged actual ID values at trace level; new code logs only structural information. Eliminates log-exfiltration of pseudonymous identifiers. - O(1) length guard before character scanning (
synthetic.rs:42): Constant-time length check before iteration is a nice DoS mitigation for oversized inputs. - Comprehensive test coverage: Covers invalid header/cookie rejection, header precedence, fallthrough behavior, generation on invalid input, and edge cases (uppercase hex, oversized input, empty string, dot in suffix, extra segments, missing suffix, non-hex, non-alphanumeric suffix).
⛏ nitpick
- CHANGELOG says "64-hex-hmac" but validator requires lowercase hex (
CHANGELOG.md:12): Saying "64-lowercase-hex-hmac" would be more precise. Minor since the code is authoritative.
CI Status
- fmt: PASS
- clippy: PASS
- rust tests: PASS
- js tests: PASS
Summary
x-synthetic-idheader andsynthetic_idcookie were accepted without validation, allowing injection of arbitrary strings into response headers, cookies, and third-party API callsis_valid_synthetic_id()validator (64 lowercase hex +.+ 6 alphanumeric) with an O(1) length check first to bound all downstream work; invalid values are silently discarded and a fresh ID is generated in their placeINFOtoDEBUGto avoid recording pseudonymous identifiers in production log pipelinesChanges
crates/common/src/synthetic.rsis_valid_synthetic_id()production validator; validate inget_synthetic_id(); adddebug_assert!in generator; demote ID values todebug!; new rejection + fallthrough testscrates/common/src/test_support.rsVALID_SYNTHETIC_IDconstantcrates/common/src/proxy.rsVALID_SYNTHETIC_IDcrates/common/src/integrations/registry.rsdocs/guide/synthetic-ids.mdCHANGELOG.md### Securityentry under[Unreleased]Closes
Closes #412
Test plan
cargo test --workspacecargo clippy --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npm run formatcd docs && npm run formatChecklist
unwrap()in production code — useexpect("should ...")tracingmacros (notprintln!)